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Abstract 

Networks  that  solve  specific  visual  tasks,  such  as  the  evaluation  of  spatial  relations  with 
hyperacuity  precision,  can  be  easily  synthesized  from  a  small  set  of  examples.  This  may 
have  significant  implications  for  the  interpretation  of  many  psychophysical  results  in  terms 
of  neuronal  models. 
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1  A  general  framework  for  psychophysical  modeling 


We  wish  to  propose  a  single,  new  hypothesis  instead  of  the  many  specific  models  that  are 
invoked  to  explain  a  broad  range  of  visual  abilities  as  measured  in  psychophysical  tests.  We 
will  consider,  in  particular,  hyperacuity  tasks  as  an  example  for  our  claims. 

For  any  given  visual  competence,  it  is  tempting  to  conjecture  a  specific  algorithm  and  a 
corresponding  neural  circuitry.  It  has  been  often  implicitly  assumed  that  this  machinery  may 
be  hardwired  in  the  brain.  This  extreme  point  of  view,  if  taken  seriously,  may  quickly  lead  to 
absurd  consequences.  Consider  for  instance  the  many  different  hyperacuity  tasks,  some  of  which 
are  outlined  in  Figure  1.  The  underlying  reason  for  the  spectacular  performance  of  human 
subjects  in  rh^se  ta'-ko  is  fhat  the  information  sampled  by  the  photoreceptors  and  relayed  to 
the  brain  does  contain  the  information  necessary  for  precise  localization  of  image  features,  since 
the  spacing  between  photoreceptors  and  the  eye's  optics  satisfy  (in  the  fovea)  the  constraints 
of  the  sampling  theorem  [5].  More  specifically,  it  has  been  shown  that,  in  principle,  spatial 
mechanisms  that  account  for  grating  resolution  are  sensitive  enough  to  support  hvperacuity-level 
performance  [13,4,26].  Furthermore,  some  of  the  hyperacuity  tasks  can  be  solved  by  detecting 
“secondary”  cues  such  as  luminance  difference  (as  in  the  bisection  task)  or  orientation  (as  in 
the  detection  of  vertical  vernier  stimuli).  The  detailed  structure  of  the  neural  circuitry  that 
subserves  the  detection  of  these  cues,  or  hyperacuity  performance  in  other  tasks  is,  however, 
unknown. 

Notice  that  the  idea  of  a  fine-grid  reconstruction  of  the  image  in  some  layer  of  the  cortex 
[1,5]  is  unsatisfactory,  because  it  still  requires  a  homunculus  looking  at  the  reconstructed  image 
and  applying  a  different  routine  for  each  specific  hyperacuity  task.  We  propose  instead  [16] 
that  the  brain  may  be  able  to  synthesize  -  possibly  in  the  cortex  -  appropriate  modules  for 
specific  tasks  after  a  quick  training  phase  in  which  it  is  exposed  to  examples  of  the  task.  In 
most  psychophysical  experiments,  subjects  are  actually  shown  several  examples  of  the  task 
before  testing  takes  place,  flyperacuity  tests,  in  particular,  require  a  significant  training  period 
in  order  to  achieve  good  performance  (thresholds  typically  decrease  by  a  factor  of  two  to  four 
during  the  first  several  hundreds  of  stimulus  presentations  [24];  on  ihe  other  hand,  some  subjects 
have  thresholds  of  10"  or  less  upon  the  first  *ostir.g).  A  broad  prediction  of  our  conjecture  is 
that  almost  any  psychophysical  task  could  be  performed  after  suitable  training,  provided  the 
necessary  information  is  available  in  the  stimulus. 

Synthesizing  a  module  from  examples  for  a  specific  task  may  be  often  regarded  as  approxi¬ 
mating  a  multivariate  function  from  sparse  data.  An  efficient  scheme  for  the  approximation  of 
smooth  functions  was  proposed  recently  under  the  name  of  HyperBF  networks  [19].  Detailed 
descriptions  of  it,  its  theoretical  underpinnings  and  its  performance  can  be  found  in  [19],  [16], 
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Figure  1:  Examples  of  six  tasks  in  which  human  subjects  perform  at  hvperacuity  levels  (that 
is,  exhibit  resolution  finer  than  the  snaring  between  individual  phoioro*  epiois)  M<tiiy  other 
variations  are  possible,  such  as,  for  instance  a  horizontal  vernier. 
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Figure  2-  (a)  A  network  representation  of  approximation  by  Hyper  Basis  Functions,  (b)  shows 
an  equivalent  interpretation  of  (a)  for  the  case  of  Gaussian  radial  basis  functions.  Gaussian 
functions  can  be  synthesized  as  the  product  of  two-dimensional  Gaussian  receptive  fields  op¬ 
erating  on  retinotopic  maps  of  features.  The  solid  circles  in  the  image  plane  represent  the  2D 
Gaussians  associated  with  the  first  radial  basis  function,  which  represents  the  first  view  of  the 
object.  The  dashed  circles  represent  the  2D  receptive  fields  that  synthesize  the  Gaussian  ra¬ 
dial  function  associated  with  another  view.  The  Gaussian  receptive  fields  transduce  positions 
of  features,  represented  implicitly  as  activity  in  a  retinotopic  array,  and  their  product  "com¬ 
putes”  the  radial  function  without  the  need  of  calculating  norms  and  exponentials  explicitly. 

[18],  .  The  module  is  an  approximation  of  a  multivariate  function  in  terms  of  basis  functions 
with  parameter  values  that  have  to  be  found  -i.e.  "learned”  -  from  the  data  -  i.e.  the  exam¬ 
ples.  The  expansion  has  the  form 

n 

rw  -  J^caG(\\x  -  ta\\w)  +  p(x)  (1) 

C*  =  l 

where  the  parameters  tQ  that  correspond  to  the  centers  of  basis  functions,  and  the  coefficients 
ca  are  unknown,  and  are  in  general  much  fewer  than  the  data  points  (n  <  N).  The  norm  is  a 
weighted  norm 


iiv  _  t ji?..  =  iv  _  ♦  'TwTjv( x  —  t  - 1  '2: 

i  ,iii  o*  /  V* 

where  W  is  an  unknown  square  matrix  and  the  superscript  T  indicates  the  transpose.  In  the 
simple  case  of  Jiagonal  11  the  diagonal  elements  te,  assign  a  specific  weight  to  each  input 
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coordinate,  determining  in  fact  the  units  of  measure  and  the  importance  of  each  feature  [19]. 
Equation  1  can  be  implemented  by  the  network  of  Figure  2.  The  parameters  c,  t,  W  are  searched 
for  during  learning  by  minimizing  the  error  functional  defined  as 


where 


At 

H[f']  =  H  c,t.w  =  I>‘)2’ 

i=i 


-X  =  y-  -  /‘(x)  =  y,  e^GlIjx,  -  talj^r). 

Qf=  1 

Iterative  methods  of  the  gradient  descent  type  can  be  used  for  the  minimization  of  H .  An 
even  simpler  method  that  does  not  require  calculation  of  derivatives  is  to  look  for  random 
changes  (controlled  in  appropriate  wavs)  in  the  parameter  values  that  reduce  the  error  (cf. 
[14,2]).  The  interpretation  of  the  network  of  Figure  2  is  the  following.  The  centers  of  the 
basis  functions  are  similar  to  prototypes,  since  they  are  points  in  the  multidimensional  input 
space.  Each  unit  computes  a  (weighted)  distance  of  the  inputs  from  its  center  and  applies  to 
it  the  radial  function.  In  the  case  of  the  Gaussian,  a  unit  will  be  the  most  active  when  the 
input  exactly  matches  its  center.  The  output  of  the  network  is  a  linear  superposition  of  the 
activities  of  all  the  basis  functions,  plus  direct,  weighted  connections  from  the  inputs  (the  linear 
terms  of  p(x))  and  from  a  constant  input  (the  constant  term).  Notice  that  in  the  limit  case  of 
the  basis  functions  approximating  delta  functions,  the  system  becomes  equivalent  to  a  look-up 
table  holding  the  examples. 

2  Example:  simulated  experiments  in  hyperacuity 

In  the  preceding  section  we  have  proposed  that  hvperacuity  may  consists  of  tasks  learned  by 
subjects  from  a  few  examples  (mostly)  in  the  psychophysical  lab,  exploiting  modules  (in  the 
cortex?)  that  perform  learning  from  examples,  that  is  multivariate  function  approximation 
from  sparse  data.  This  hypothesis,  if  pushed  to  its  extreme  version,  may  represent  a  rather 
general  framework  for  psychophysical  modeling.  To  justify  this  proposal,  we  have  conducted  a 
series  of  simulated  psychophysical  experiments,  in  which  a  HyperBF  module  has  been  trained 
to  perform  several  different  hyperacuity  tasks.  The  details  of  the  experiments  are  described 
below. 

2.1  Simulation  details 

The  input  to  the  module  was  an  array  of  “photoreceptors”  whose  activity  corresponded  to  the 
input  image,  blurred  by  the  eye’s  optics.  There  were  eight  receptors,  positioned  randomly  on  a 
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Figure  3:  An  illustration  of  the  vernier  acuity  task:  the  subject  has  to  tell  whether  the  upper 
bar  is  to  the  left  or  to  the  right  of  the  lower  one.  Human  subjects  (and  the  HvperBF  simulation) 
perform  this  task  at  hyperacuity  levels,  that  is,  the  minimum  discernible  horizontal  displacement 
of  the  two  bars  is  much  smaller  than  the  average  distance  between  adjacent  photoreceptors.  The 
photoreceptor  mosaic  is  shown  superimposed  on  the  stimulus.  Each  cone  is  shown  as  a  circle  that 
represents  the  Gaussian  spread  of  a  point  source  shining  at  the  corresponding  retinal  location. 
This  spread  is  due  to  the  low-pass  characteristics  of  the  optics  of  the  eye.  Our  simulation  does 
not  require  positioning  the  •‘receptors"’  at  precisely  defined  locations. 

loose  4x2  grid  (see  Figure  3).  Each  of  the  receptors  calculated  its  response  by  integrating  the 
input  over  a  region  of  the  “retina”  shaped  as  a  Gaussian,  with  two  space  dimensions  (a  =  30") 
and  one  time  dimension  {a  =  0.5  units).  The  space  dimensions  spanned  the  entire  180"  X  360" 
patch  of  the  “retina”,  while  the  time  dimension  had  an  extent  of  ±1  unit.  The  8-component 
vector  of  receptor  outputs  constituted  the  input  to  the  HyperBF  module,  which  was  trained  to 
produce  an  output  of  +1  for  one  sense  of  the  input  vernier  displacement,  and  —1  for  the  other. 

The  performance  of  the  module  was  estimated  by  measuring  the  absolute  error,  that  is,  the 
distance  between  the  actual  output  (which  could  be  any  number  between  -1  and  +1;  for  a 
proof  see  [6])  and  the  desired  output  (±1;  see  Figure  4).  Without  going  into  the  details,  we 
point  out  that  the  absolute  output  error  is  a  good  analog  of  acuity  threshold,  since  the  two  are 
related  monotonically. 
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Figure  4:  The  relationship  between  the  performance  index  used  in  the  simulations  —  the 
absolute  output  error  of  the  HyperBF  module  —  and  the  acuity  threshold.  The  probability 
density  of  the  output  is  shown  as  a  distribution  centered,  say,  at  r  >  0,  whose  tail  extends 
across  0  to  the  other  half  of  the  ±1  range  of  possible  values.  The  area  A  under  the  tail  of 
the  distribution  indicates  the  probability  of  erroneous  response,  given  the  statistics  represented 
by  the  mean  and  standard  deviation  of  error  (the  two  parameters  we  have  measured  in  the 
simulations).  The  acuity  threshold,  in  turn,  can  be  related  to  the  probability  of  erroneous 
response  through  probit  analysis. 

2.2  Replication  of  the  basic  psychophysical  findings  for  the  vernier  task 

The  HyperBF  module  coupled  to  the  input  mechanism  described  above  successfully  replicated, 
after  a  training  phase  typically  consisting  of  about  50  “examples”,  the  following  four  basic 
findings  of  the  psychophysics  of  hvperacuity  in  human  subjects: 

•  The  equivalent  acuity  threshold  was  significantly  lower  than  the  spacing  of  the  receptors 
in  the  simulated  retina  ([10.22];  Figure  5). 

•  The  threshold  improved  with  increasing  vertical  separation  of  the  two  segments  compris¬ 
ing  the  vernier  stimulus  ([24];  Figure  6).  We  note  that  in  human  subjects  this  improve¬ 
ment  reverts  with  further  increase  in  the  vertical  separation:  this  phenomenon  was  also 
replicated  by  the  model. 
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•  The  threshold  deteriorated  with  increasing  orientation  difference  between  training  and 
testing  trials.  This  deterioration  was  more  pronounced  tor  shorter  stimuli  ([21];  Figure  7). 

•  Performance  remained  at  hyperacuity  levels  when  the  stimuli  moved  across  the  retina, 
and  was  the  highest  when  the  velocity  of  the  stimulus  translation  was  the  same  during 
training  and  testing  ([23];  Figure  8). 

Importantly,  the  hyperacuity-level  performance  was  independent  of  the  precise  location  of 
the  receptors.  At  the  same  time,  different  quasi-random  receptor  mosaics  yielded  different 
thresholds,  sometimes  by  as  much  as  a  factor  of  two.  A  similar  range  of  hyperacuity  thresholds 
is  observed  in  human  subjects,  even  at  full  acuity  and  perfectly  normal  eyes. 

2.3  Comparison  among  line  vernier,  three-point  bisection  and  dot  vernier 
tasks 

The  next  experiment  compared  the  performance  of  an  HyperBF  module  in  the  %-ernier  task 
with  that  in  anothei  hyp"racuitv  task,  the  three-point  bisection.  The  stimulus  in  the  bisection 
task  consists  of  three  dots,  arranged  in  a  vertical  line,  at  an  approximately  even  spacing.  The 
subject  has  to  determine  whether  the  middle  dot  is  above  or  below  the  midpoint  of  the  segment 
formed  by  the  other  two  dots.  The  HyperBF  module  learned  this  hyperacuity  task  just  as  easily 
as  it  did  in  the  line  vernier  case. 

Another  experiment  made  a  comparison  between  the  line  vernier  task  and  a  similar  one  in 
which  each  of  the  line  segments  has  been  replaced  by  two  dots  (situated  at  its  endpoints).  The 
network  learned  this  task,  as  it  did  previously  in  the  line  vernier  and  the  bisection  cases.  The 
comparison  between  the  two  vernier  tasks  appears  in  Figure  9.  The  better  performance  of  the 
HyperBF  module  in  the  dot  vernier  task  for  small  X-offsets  parallels  a  recent  surprising  finding 
with  human  subjects  (M.  Fahle,  personal  communication). 

2.4  Replication  of  the  decrease  of  vernier  threshold  with  practice 

A  major  characteristic  of  human  performance  in  hyperacuity  tasks  is  the  gradual  and  constant 
improvement  of  the  threshold,  which  continues,  albeit  at  a  slow  rate,  after  ten  thousand  trials 
([9];  see  the  appendix).  We  have  replicated  this  phenomenon  by  endowing  the  model  with  a 
learning  mechanism  that  we  call  “incremental  learning"’  (see  also  [3])  and  that  consists  of  two 
phases.  First,  gradual  improvement  was  obtained  by  letting  the  model  perform  a  local  random 
search  in  the  space  of  HyperBF  center  coordinates.  Second,  when  the  model's  performance  on 
a  new  input  was  markedly  inadequate  (in  comparison  with  recent  history),  that  input  was 
adjoined  to  the  model  as  an  additional  center  (prototype).  In  the  appendix  we  discuss  how 
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Figure  5:  Mean  error  of  the  synthesized  module  vs.  X-offset  of  the  vernier  stimulus.  The 
module  was  trained  to  output  1  for  left  offset  and  -1  for  right  offset.  Consequently,  error 
of  0.1  corresponds  to  high  performance  (bars  in  this  and  other  figure  denote  ±1  standard 
error  of  the  mean)  Tiie  values  of  X-offset  along  the  abscissa  are  the  lower  bounds  of  an  octave 
range  (e.g.,  4  pixels  means  that  the  offsets  were  uniformly  distributed  between  4  and  8  pixels; 
in  all  our  simulations  the  ‘cale  w'as  10  pixels  to  30").  The  three  curves  correspond  to  three 
training/testing  combinations.  In  the  first  one  (•).  the  same  X-offset  range  was  used  both  for 
training  and  testing.  In  the  other  two  combinations  ( [  and  f ).  the  testing  range  was  one-half 
and  twice  as  large  as  the  training  range,  respc  tively.  Note  that  X-offsets  which  yielded  high 
performance  (mean  error  smaller  than  0.05)  are  much  smaller  than  the  photoreceptor  spacing 
(6",  compared  to  about  30"). 

the  incremental  learning  algorithm  can  be  naturally  extended  to  work  even  without  explicit 
examples,  that  is  without  feedback,  for  appropriate  tasks. 

The  algorithm  for  adjusting  the  positions  of  the  existing  centers  was  as  follows.  For  each  new 
input,  the  system  made  between  10  and  100  random  changes  in  the  value  of  a  randomly  chosen 
coordinate  of  a  center  (the  amplitude  of  the  change  was  about  ten  percent  of  coordinate  value). 
After  each  change  the  error  for  that  particular  input  was  recalculated.  If  the  new  error  was 
lower  (and,  with  a  small  probability,  if  the  error  increased),  the  change  was  inccrporated  into 
the  system,  otherwise  the  change  was  reversed  (cf.  [2]).1  If  at  any  stage  during  the  simulated 

'The  probability  of  keeping  a  change  that  led  to  a  higher  error  <_ould  be  decreased  with  time,  a«  in  the 
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Figure  6:  Mean  error  of  the  synthesized  module  vs.  Y-offset  of  the  vernier  stimulus,  bv  X-offset. 
The  four  curves  c  orrespond  to  four  values  of  X-offset.  Onc°  t he  X-offset  is  high  enough  to  guar¬ 
antee  good  performance  (curve'  2.  3  and  4).  increasing  the  Y-offset  improves  the  performance 
level,  as  it  does  in  human  subjects. 

experiment  the  current  input  was  too  distant  from  .y  of  the  existing  HyperBF  centers,  that 
input  was  adjoined  to  the  mode1  <is  a  new  center  (cf.  learning  by  example  acquisition  in  the  CLF 
model  of  object  recognition  ([8.<j:  see  also  [18]).  The  performance  of  the  resulting  algorithm  that 
combined  adjustment  of  existing  centers  with  recruitment  of  new  centers  is  shown  in  Figure  10. 

3  Conclusions 

3.1  Discussion 

The  skeleton  model  described  in  the  preceding  sections  is  specific  enough  to  be  put  to  a  psy¬ 
chophysical  test.  One  possible  wav  to  do  so  is  to  test  the  prediction  of  the  model  legarding 
generalization  of  performance  from  a  well-practiced  to  an  unfamiliar  range  of  inputs.  Consider, 
for  concreteness’  sake,  the  vender  acuity  t?ck.  If  the  human  vi.-,;;,d  system  relics  on  a  memory- 
based  mechanism  such  as  HyperBF  interpolation  to  solve  this  problem,  a  drop  in  performance 
(that  is,  an  increase  in  the  error  rate)  is  expected  when  the  range  of  the  stimuli  is  suddenly 

simulated  annealing  approach  to  optimization  [12]  (this  feature,  however,  appeared  to  be  unnecessary  for  our 
purposes). 


0 


3  4  5 

Orientation  (deg) 


Figure  7:  Mean  error  of  the  synthesized  module  '-s.  orientation  of  the  stimulus  (shown  along 
the  abscissa  as  the  lower  bound  of  a  1- octave  range,  in  degrees),  by  stimulus  length.  X-offset, 
was  between  4  and  8  pixels  ( 12"  to  24"),  Y-offset  was  1  pixel  (3").  The  four  curves  ronespond 
to  four  values  of  segment  length,  from  10  to  40  pixels  (30"  to  80").  In  general  performance  is 
seen  to  deteriorate  with  increased  ork  ttation  range. 

changed  (e.g..  if  the  verniers  are  made  smaller  by  a  factor  of  two  or  more  in  comparison  with 
the:r  values  during  training).  The  same  prediction  holds  for  a  change  to  a  different  hyperacuity 
task  (say  from  the  top  left  stimulus  in  Figure  1  to  the  bottom  right  one).  If  regression  analysis 
is  used  to  obtain  an  estimate  of  the  psychometric  function  from  error  rates,  such  a  change  in 
the  stimulus  range  would  cause  a  decrease  in  the  coefficient  of  determination  of  the  regression, 
or  in  related  measures  of  the  goodness  of  fit.  Moreover,  the  subsequent  recovery  of  performance 
should  be  slower  if  no  feedback  is  provided  after  the  change  (even  though  some  learning  ap¬ 
pears  to  be  possible  even  without  explicit  feedback:  see  the  appendix).  There  are  preliminary 
indications  that  both  these  phenomena  indeed  happen  in  practice  [9], 

No  such  response  to  a  change  in  the  stimulus  range  should  be  found  if  the  visual  system 
has  a  built-in  scale  invariance  mechanism.  Different  versions  of  scale- in  variant  models  of  early 
visual  processing  have  been  offered  in  the  past  (e.g.,  [20]).  For  our  present  purpose,  a  simple 
scheme,  in  which  invariance  is  achieved  through  simultaneous  processing  of  the  input  at  several 
levels  of  resolution  (corresponding  to  several  overlapping  grids  of  “ganglion"  cells  of  different 
size  and  spacing),  would  suffice.  In  such  a  case,  the  system  could  be  pr  "ared  in  advance,  say, 
to  a  reduction  in  the  input  scale  (up  to  a  certain  limit),  simply  because  the  small-scale  grid 
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Figure  8:  Mean  error  of  the  synthesized  module  vs.  velocity  of  the  stimulus  during  testing. 
X-offset  was  between  -1  and  8  pixels  (12"  to  16").  Y-offset  was  1  pixel  (3").  The  four  curves 
correspond  to  four  values  of  velocity  during  training  (same  set  of  4  values  as  the  testing  veloci¬ 
ties).  In  general,  performance  deteriorates  with  increased  testing  velocity,  but  to  a  lesser  extent 
if  the  training  velocity  was  relatively  high  as  well. 

would  exhibit,  after  the  reduction,  a  pattern  of  activity  isomorphic  to  the  pattern  evoked  by  the 
large-scale  input  in  the  large-scale  grid  (see  Figure  11).  Finally,  we  remark  that  the  mechanism 
underlying  scale  invariance  (if  any)  could  be  probed  by  blurring  the  input  to  the  extent  that 
the  small-scale  grids,  but  not  the  large-scale  ones,  are  affected. 

In  general,  we  expect  that  the  cortex  performs  suitable  pre-processing  to  provide  approxi¬ 
mate  invariance  to  certain  basic  transformations,  without  the  need  for  explicit  learning.  Trans¬ 
lation,  in  addition  to  scale,  is  another  obvious  candidate  transformation  for  which  invariance 
could  be  built  in.  The  bare  version  of  our  network,  described  here,  would  not  generalize  from 
one  patch  of  the  retina  to  another  (though  this  may  not  be  fully  necessary;  cf.  [15]).  It  seems 
likely  that  translation  invariance,  at  least  up  to  a  certain  extent,  should  be  provided  by  mech¬ 
anisms  preceding  the  learning  stage  (possibly  related  to  the  “focus  of  attention”  idea).  It  is 
possible  that  preprocessing  mechanisms  could  also  provide  invariance  to  the  specific  stimulus 
type  by  computing  the  equivalent  of  “place  tokens”.  This  would  enable  the  system  to  gener¬ 
alize  automatically  (without  the  need  for  examples)  from,  say,  line  stimuli  to.  say,  dot  stim¬ 
uli,  but  would,  of  course,  void  to  some  extent  the  significance  of  our  model.  In  any  case,  the 
input  to  a  learning  model  such  as  the  one  we  have  outlined  should  not  be  raw  photoreceptor 
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Line  vernier  Dot  vernier 


Figure  9:  Mean  error  of  the  synthesized  module  vs.  X-offset  of  the  vernier  stimulus,  by  Y-offset. 
Left:  line  vernier  stimulus.  Right:  dot  vernier  stimulus.  Note  better  performance  in  the  latter 
case  for  small  X-offsets. 

activities,  as  in  our  simulations,  but  rather  pre-processed  photoreceptor  activities.  The  type  of 
preprocessing  in  human  vision  and  the  associated  pseudo-invariances  it  supports  are  an  exper¬ 
imental  question  of  great  interest.  Of  course,  any  lack  of  generalization  would  be  support  for 
the  model.  Experimental  demonstration  of  tranfer  of  learning  with  respect  to  translation  and 
scale,  would  not  represent  in  our  view  a  major  problem  for  the  model,  though  it  would  require 
a  more  complex  preprocessing  than  the  one  we  have  simulated.  Transfer  of  learning  from  one 
type  of  stimulus  to  another  (see  Figure  1)  would  be  a  more  serious  blow  to  the  spirit  of  our 
model  and  therefore  a  more  critical  test  of  its  validity. 

3.2  Summary 

The  specific  implication  of  this  work  is  that  human-like  performance  in  different  hyperacuity 
tasks  can  be  obtained  by  modules  synthesized  “on  the  fly"  from  a  few  examples  of  that  task.  In 
view  of  the  results  reported  above,  we  conjecture  that  the  module  responsible  for  hyperacuity- 
level  performance  is  synthesized  in  a  demand-driven  fashion,  when  the  task  is  first  performed 
by  the  subject. 

More  generally,  one  may  apply  the  same  line  of  reasoning  to  other  visual  tasks  studied  by 
psychophysicists.  To  this  effect,  it  is  important  that  the  technique  we  have  used  for  learning  can 
be  implemented  as  a  simple  biologically  plausible  network  [19].  Furthermore,  this  approach  has 
recently  been  demonstrated  as  effective  in  modeling  central  aspects  of  human  performance  in 
three-dimensional  object  recognition  [17.6].  It  remains  to  be  seen  whether  the  above  framework 
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1.054; 

0.640; 

0.664. 


Before:  err=-0.060  n  ♦ 
After:  err=-0.015  n  + 
Overall:  err=-0.009  n  ♦ 


Figure  10:  Replication  of  the  gradual  improvement  in  vernier  acuity  with  practice  by  the 
random  search  technique  described  in  section  2.4.  The  best  linear  fit  to  the  data  set  has  a 
slope  of  -0.009.  In  other  words,  the  HvperBF  module  exhibited  continuous  learning,  just  as 
the  human  subjects  did. 

would  prove  useful  in  unifying  the  existing  diverse  theoretical  approaches  to  the  modeling  of 
visual  perception,  and  of  brain  function  in  general. 


Appendix:  Learning  modes  of  the  HyperBF  scheme 

Incremental  learning  and  bootstrapping  in  the  absence  of  feedback 

The  HyperBF  module  must  be  allowed  to  improve  its  performance  throughout  the  testing  stage, 
with  and  without  feedback.  This  can  be  achieved  by  using  the  algorithm  that  we  described  in  the 
body  of  the  paper:  centers  are  added  when  the  model  performance  is  inadequate.  Coefficients 
are  modified  and  -  possibly  on  a  slower  time  scale  -  centers  are  moved.  Performance  is  easily 
measured  in  the  presence  of  feedback,  in  terms  of  the  error  between  the  predicted  value  and  the 
correct  one.  If  no  feedback  is  available,  it  is  still  possible  to  estimate  performance  if  the  new 
input  is  not  too  far  away  from  the  existing  centers,  so  that  the  network  can  classify  it  correctly, 
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Figure  11:  Scale  invariance  in  hyperacuity  tasks  can  be  achieved  in  principle  through  simultane¬ 
ous  processing  of  the  input  at  several  levels  of  resolution,  corresponding  to  several  overlapping 
grids  of  “ganglion”  cells  of  different  size  and  spacing. 

even  if  not  very  reliably.  Thus,  a  small  modification  of  the  scheme  makes  it  to  work  in  the 
absence  of  feedback,  under  certain  conditions.  Imagine  that  a  few  examples  of  the  hyperacuity 
task  are  given  with  feedback,  that  is  with  the  correct  classification.  Subsequently,  new  stimuli 
are  given  without  feedback.  If  these  stimuli  are  sufficiently  similar  to  the  original  examples,  the 
network  may  be  able  to  classify  them  correctly  and  then  incorporate  them  as  new  centers  (i.e., 
templates),  effectively  bootstrapping  the  learning  process. 

Notice  that  such  incremental  learning  tasks  are  not  uncommon.  In  particular,  hyperacuity  is 
often  tested  within  the  paradigm  of  adapting  the  size  of  the  offset  to  the  subject’s  performance, 
therefore  decreasing  it  slowly  during  the  test.  Under  these  conditions,  the  offset  in  each  trial  is 
never  less  than  half  the  offset  of  the  previous  trial.  According  to  our  simulations,  the  network 
described  earlier  can  generalize  rather  well  to  offsets  of  half  the  size  (but  not  to  offsets  of  say, 
four  times  the  training  size).  The  incremental  learning  algorithm  described  in  the  main  text 
may  be  extended  in  the  following  way.  In  the  absence  of  feedback  the  network  attempts  to 
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classify  a  new  stimulus  and  to  use  it  as  an  example  for  incremental  learning,  provided  that  the 
classification  is  sufficiently  reliable  (that  is.  provided  there  is  at  least  one  unit  in  the  network 
which  is  sufficiently  active,  indicating  that  the  new  stimulus  is  sufficiently  close  to  one  of  the 
existing  centers). 

Learning  algorithms:  details 

The  basic  mechanism  of  learning  in  HvperBF  networks  is  the  computation  of  the  optimal  set 
of  coefficients  ca  which  relate  the  network's  output  vector  to  the  vector  whose  components  are 
the  activities  of  the  individual  basis  function  units.  Finding  the  matrix  of  coefficients  amounts 
to  the  solution  of  a  linear  system,  provided  that  the  number  of  input/output  examples  is  the 
same  as  the  number  of  basis  functions.  If  there  are  more  examples  than  basis  functions,  the 
resulting  overconstrained  system  can  be  solved  by  pseudoinverse  methods.  A  one-shot  method 
of  this  type  does  not  appear  to  be  biologically  plausible.  However,  an  equivalent  result  may  be 
achieved,  for  the  case  of  ca,  by  gradient  descent  that  can  be  implemented  through  a  Hebbian 
mechanism  (see  [18]). 

In  the  overconstrained  case  repositioning  the  HvperBF  centers  tQ  through  gradient  descent 
can  also  improve  the  module's  performance  (see  also  section  2.4).  To  cite  a  concrete  example,  we 
have  trained  a  20-center  network  with  50  vernier  examples,  achieving  mean  error  of  0.67  ±0.07. 
After  20  steps  of  gradient  descent,  the  error  dropped  to  0.045  ±  0.006. 

In  a  more  realistic  situation,  the  HvperBF  module  should  be  allowed  to  improve  its  per¬ 
formance  not  only  during  specially  designated  training  trials,  but  also  throughout  the  testing 
stage  as  it  is  the  case  for  our  incremental  learning  algorithm  described  in  the  main  text.  Our 
simulation  is  based  on  a  random  search  method  described  in  section  2.4,  and  on  augmenting 
the  HvperBF  module  with  a  Widrow-HofF  learning  mechanism  (see  [25]),  in  which  the  coeffi¬ 
cients  ca  are  modified  according  to  the  following  formula: 

ct+1  =7c‘  (f4-f4)h4 

where  f4  and  f4  are  the  correct  and  the  estimated  output  values  at  trial  t,  and  h4  is  the  vector 
of  intermediate-layer  values  (which  are  the  activities  of  the  basis  units).  In  other  words,  the  co¬ 
efficients  ca  are  modified  by  an  amount  proportional  to  the  error  made  in  the  current  trial.  It 
has  been  shown  [25,11]  that  the  Widrow-Hoff  mechanism  is  equivalent  to  an  incremental  com¬ 
putation  of  the  appropriate  pseudoinverse.  In  our  simulations,  mean  error  typically  improved 
by  0.004  per  trial  for  about  100  trials  (as  found  by  a  linear  regression  of  error  on  trial  number), 
then  became  constant.2 

3These  figures  varied  with  the  coefficient  y  of  the  Widrow-Hoff  equation. 
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